Logo

Arya Koureshi
401204008

Learning to predict where humans look

For many applications in graphics, design, and human computer interaction, it is essential to understand where humans look in a scene. Where eye tracking devices are not a viable option, models of saliency can be used to predict fixation locations. Most saliency approaches are based on bottom-up computation that does not consider top-down image semantics and often does not match actual eye movements. To address this problem, we used eye tracking data of 15 viewers on 1003 images and use this database as training and testing examples to learn a model of saliency based on low, middle and high-level image features. This large database of eye tracking data is publicly available with the paper (Tilke Judd).

http://people.csail.mit.edu/tjudd/WherePeopleLook/index.html

1. Eye tracking database

We collected a large database of eye tracking data to allow large-scale quantitative analysis of fixation points and gaze paths and to provide ground truth data for saliency model research. The images, eye tracking data, and accompanying code in Matlab are all available. In order to extract clusters of fixation locations and see the locations overlaid on the image, do the following:

>> showEyeData ([path to eye data for one user], [path to original images])
>> showEyeData('../DATA/hp', '../ALLSTIMULI')

You can also run

>> showEyeDataAcrossUsers([path to images], [number of fixations per person to show])
>> showEyeDataAcrossUsers('../ALLSTIMULI', 6)

This shows the eye tracking center of the eye tracking fixations for all users on a specific image. The fixations are color coded per user and indicate the order of fixations. Press space to continue to the next image. You can see information about the fixations in ‘Eye tracking database\DATA\hp’:

i1000978947.DATA.eyeData

This is the raw x and y pixel locations of the eye for every sample that the eye tracker took. An eye tracker recorded their gaze path on a separate computer as they viewed each image at full resolution for 3 seconds (725 sample) separated by 1 second of viewing a gray screen. To separating bottom-up and topdown computation you should divide samples into two groups (First 1.5 S and second 1.5 S).


2. saliency model

To get the code for features to work, go to (’Our saliency model\JuddSaliencyModel\JuddSaliencyModel \README.txt’) and install the ordered tools.

saliency.m is the code that loads in an image and returns a saliency map. It finds all the necessary features (find*.m files) and then loads the model.mat file to get the weights for combining all the features in a linear model.

As you see in saliency.m, there are 7 groups of FEATURES. You need to uncomment every feature except one; run the saliency.m to create different saliency maps. What is the features that each FEATURES returns?


3. you want to compare saliency maps to fixations

Now we have 7 different saliency maps and 2 fixation maps for each example image. You should calculate ROC between saliency maps and fixations. So we have 2 ROC for each FEATURES that determines which FEATURES is the effect of bottom-up and witch is top-down computation.

For calculation of ROC, see the (‘gbvs\gbvs\ readme’) line 163.


Eye tracking database

One user

1 2 3


All users

4 5 6


saliency model

The saliency map is generated by synthesizing three different feature groups: low-level, mid-level, and high-level features. These features are extracted and combined using a specific function to create the saliency map.

Low-Level Features

The low-level features include the following:

Mid-Level Features

The mid-level feature group focuses on the following aspect:

High-Level Features

The high-level feature group involves the following feature:

These feature groups are individually processed and combined to produce the saliency map, providing insights into the most visually salient regions within an image.

pic #100

1

Subband

1

Itti

1

Color

1

Torralba

1

Horizon

1

Object

1

Center

1

All

1

The saliency map is generated by setting all weights, except for the one corresponding to the desired feature, to zero. The blue dot represents the initial eye location, while the red dot indicates the final eye location.

pic #200

1

Subband

1

Itti

1

Color

1

Torralba

1

Horizon

1

Object

1

Center

1

All

1

The saliency map is generated by setting all weights, except for the one corresponding to the desired feature, to zero. The blue dot represents the initial eye location, while the red dot indicates the final eye location.

pic #400

1

Subband

1

Itti

1

Color

1

Torralba

1

Horizon

1

Object

1

Center

1

All

1

The saliency map is generated by setting all weights, except for the one corresponding to the desired feature, to zero. The blue dot represents the initial eye location, while the red dot indicates the final eye location.

In the figures, it is evident that since the fixation point is always at the center, the eye position consistently begins from the center of the image. When faces are present in the image, the subjects predominantly direct their gaze towards the faces initially, before focusing on other details. Thus, it is important to note that subjects tend to prioritize the examination of high-level features such as faces and objects, before attending to finer details and low-level features.


Different Saliency Maps for different subjects

Original Pictures

1

1

1

sub #ajs

1

1

1

sub #CNG

1

1

1

sub #emb

1

1

1

sub #ems

1

1

1

sub #ff

1

1

1

sub #hp

1

1

1

sub #jcw

1

1

1

sub #kae

1

1

1


you want to compare saliency maps to fixations

1 1 1 1 1 1 1

1

This figures created by 2 files. As you can see, All features have a high mean ROC expect the object feature. As evident from the data, the initial sets of features exhibit high ROC values, signifying their significant impact. Conversely, the final feature displays low ROC values, indicating that this particular group of features does not contribute significantly to the explanation of ROCs, unlike the others. Another thing that we can see, is that the object feature has low effect on ROCs as we saw in the previous results